Group 2

Introduction

Data sourced from paper:

“Apoptosis and other immune biomarkers predict influenza vaccine responsiveness”

Focus of current project:

  • Clean and augment data for analysis

  • PCA

  • Age analysis on antibody response to vaccine

  • Prediction of vaccine response based on Probes signal

Data

Flow chart

                                     ┌──────────────┐
                                     │   Raw Data   │
                                           │
                         ┌─────────────────┼────────────────┐
                         │                                  │                       
                         v                                  v
            ┌──────────────────────────────┐   ┌───────────────────────────┐
            │ Raw Expression Files         │   │  Raw Metadata (msb201315  │
            │ GSE41080_RAW.tar)            │   │ -s2.csv)                  │
            └──────────────┬───────────────┘   └──────────────┬────────────┘
                           │                                  │
                           │  01_load                         │
                           v                                  v
            ┌──────────────────────────────┐   ┌───────────────────────────┐
            │ Load & Extract               │   │ Load Metadata             │
            │ Normalize Columns            │   │ Clean Variables           │
            └──────────────┬───────────────┘   └──────────────┬────────────┘
                           │                                  │
                           │  02_clean                        │
                           v                                  v
            ┌──────────────────────────────┐   ┌───────────────────────────┐   
            │ Filter Probes by Mean p < .05│   │ Remove Unneeded Fields    │
            │                              │   │ (Season, VacType, etc.)   │
            └──────────────┬───────────────┘   └─────────────┬─────────────┘
                           │                                 │
                           └──────────────────┬──────────────┘
                                              │  03_augment
                                              v
                            ┌─────────────────┼───────────────────┐
             ┌───────────────────────────────┐     ┌────────────────────────────┐
             │ Pivot Expression Wide         │     │ Define Response Group      │
             │ (samples × probes)            │     │ Good vs Poor               │
             └──────────────┬────────────────┘     └──────────────┬─────────────┘
                            │                                     │
                            └─────────────────────┬───────────────┘
                                                  │
                                                  │  
                                                  v
                               ┌──────────────────────────────────┐
                               │   Merge Expression + Metadata    │
                               │   by Sample ID (file/GSM)        │
                               └──────────────────┬───────────────┘
                                                  │
                                                  │  04_describe
                                                  v
                               ┌──────────────────────────────────┐
                               │      Final Analysis Dataset      │
                               │ (Demographics + Expression +     │
                               │    Response Phenotype + Titers)  │
                               └──────────────────┬───────────────┘
                                                  │  05_analysis1
                                                  v
                  ┌──────────────────────────────────────────────────────────┐
                  │                    Downstream Analyses                   │
                  │  06_analysis2: PCA & Descriptive stats                   │
                  │  07_analysis3: Volcano plot & Biomarker discovery        │
                  │                      (t-tests, FC, FDR)                  │
                  └──────────────────────────────────────────────────────────┘

Data description

numeric_summary <- analysis_data |>
  group_by(Age_Group) |>
  summarize(Age_Range = str_c(
      round(median(Age, na.rm = TRUE), 1), 
      " (", 
      min(Age, na.rm = TRUE), 
      "–", 
      max(Age, na.rm = TRUE), 
      ")"),
    BMI_Range = str_c(
      round(median(BMI, na.rm = TRUE), 1), 
      " (", 
      min(BMI, na.rm = TRUE), 
      "–", 
      max(BMI, na.rm = TRUE), 
      ")"),
    .groups = 'drop')
# A tibble: 2 × 3
  Age_Group Age_Range  BMI_Range       
  <chr>     <chr>      <chr>           
1 Older     78 (61–93) 25.1 (18–47.3)  
2 Young     24 (20–30) 22.9 (18.8–43.6)

:::: {.columns} ::: {.column width=“40%”}

gender_table <- get_categorical_summary(analysis_data, Gender) |> 
  mutate(Variable = "Gender")
cmv_table <- get_categorical_summary(analysis_data, Cytomegalovirus) |> 
  mutate(Variable = "Cytomegalovirus")
ebv_table <- get_categorical_summary(analysis_data, EpsteinBarrvirus) |> 
  mutate(Variable = "EpsteinBarrvirus")

categorical_summary <- bind_rows(gender_table, cmv_table, ebv_table)

final_categorical_table <- categorical_summary |>
  pivot_wider(names_from = Age_Group,
              values_from = Value,
              id_cols = c(Variable, Characteristic),
              names_sort = TRUE)
# A tibble: 6 × 4
  Variable         Characteristic Older    Young   
  <chr>            <chr>          <chr>    <chr>   
1 Gender           Female         40 (67%) 14 (48%)
2 Gender           Male           20 (33%) 15 (52%)
3 Cytomegalovirus  Negative       24 (40%) 13 (45%)
4 Cytomegalovirus  Positive       36 (60%) 16 (55%)
5 EpsteinBarrvirus Negative       19 (32%) 13 (45%)
6 EpsteinBarrvirus Positive       41 (68%) 16 (55%)

PCA



Plots from paper

Biomarkers for prediction of response

Based on the plot, we observe that the most differentiated probes are ILMN_1688780 and ILMN_1739792

To find the most significant Probe that could potentially predict vaccine response we create a Boxplot for the two Probes observed previously.

ILMN_1688780 has the most clear and non-overlapping difference in the distribution of pre-vaccine expression between the two groups.

Quarto

Quarto enables you to weave together content and executable code into a finished presentation. To learn more about Quarto presentations see https://quarto.org/docs/presentations/.

Bullets

When you click the Render button a document will be generated that includes:

  • Content authored with markdown
  • Output from executable code

Code

When you click the Render button a presentation will be generated that includes both content and the output of embedded code. You can embed code like this:

ILMN_1688780 has the most clear and non-overlapping difference in the distribution of pre-vaccine expression between the two groups.